Speed for the Apple //, and breaking the 20Mhz barrier
by Richard Bennett
Copyright (c) 1990 Apple Users' Group, Sydney
Republished from Applecations, a publication of the Apple Users' Group, Sydney, Australia.


Yes, there is still some speed left in the old beast after all. The Apple // is on the verge of becoming a VERY fast machine. But first, I'd like to thank Cameron Brawn (and the AUG) for the documentation on the club's 4Mhz ZIP chip which runs the BBS, Darren Langer for the documentation on his RocketChip, and Cameron once again for the documentation on his TransWarp GS (before I bought mine). Unfortunately however, the club's ZIP chip died before I got a chance to have a good look at it.
Have you ever tried writing time critical code for the Apple //? In the old days it was easy, as every machine ran at roughly 1Mhz. These days, with so many different chip speeds, you have to be careful. Even Apple themselves have 5 different CPU speeds across the Apple // range (1 Mhz slow RAM in the ][+ //e and //GS, 1 Mhz fast RAM in the //GS, 2.8 Mhz slow RAM //GS, 2.8 Mhz fast RAM //GS, and 4 Mhz slow RAM //c+). If your timings aren't that critical, you can rule out the differences between fast RAM and slow RAM, and you've still got 3 different speeds. Now add this on to the three TransWarps for the //e (all different speeds), the ZIP chip //e (two top speeds), the Rocket chip //e (two top speeds), the TransWarp //GS (two top speeds), and the new ZIP chip //GS. This last chip is supposed to be shipping at the end of June 1990, but then again it's ZIP technology, so....
Lucky enough, all these chips can be slowed down to a more uniform speed (such as 1 Mhz). But unless you code routines to look for every one of them (they are all controlled differently!), you're still not going to get the same speed on every machine. Even then, the routine to actually slow the chip down would probably be longer than the entire routine you're trying to write!

^The ZIP chip^
The ZIP chip is controlled via a bank of softswitches in the hardware page. The address lines on the chip are tested for these addresses, and acted upon as if they were part of the machine. The ZIP chip registers range from $C05A (normally set annunciator 1) to $C05F (clear annunciator 3 and double hires off, depending on IOUDIS being on of course). To talk to the ZIP, you first have to tell it that you actually want to talk to it. To do this, you write the value $5A to the ZIP lock/unlock switch. This will unlock the ZIP ready for your commands.

        LDA  #$5A     ;Unlocking value
        STA  $C05A    ;Stick it 4 times
        STA  $C05A
        STA  $C05A
        STA  $C05A

To slow down to standard 1Mhz, simply write an invalid value to the lock/unlock register whilst it's unlocked. So, following the previous section of code, you could do this;

        LDA  #0       ;Invalid value
        STA  $C05A    ;Slow down to 1Mhz

This still leaves the ZIP unlocked, so your time dependent code should either lock the ZIP, or not go anywhere near the ZIP registers (or any code your routine calls, including ROM). To speed it back up again to it's original speed, make sure the ZIP is unlocked, and then do a write to location $C05B;

        STA  $C05B    ;Speed up again

Of course you must now re-lock the ZIP again;

        LDA  #$A5     ;This is the locking value
        STA  $C05A    ;Stick it in the lock/unlock register

Once the ZIP is locked, all accesses to it's registers are ignored, except for a sequence of four $5As written to the lock/unlock register. Changing the speed of the ZIP involves picking any of it's twenty speed settings, which range from 0.6667 to 4.0 megahertz (or to 8Mhz for the latest ZIP), and sticking the appropriate index value into the speed register at $C05D (whilst the ZIP is unlocked).

^The RocketChip^
The RocketChip has a maximum speed of 5Mhz, which is not surprising considering the engineer behind it originally worked with ZIP technologies. It also has twenty speed settings, this time getting down to 50Khz as opposed to the ZIPs 500Khz. I am yet to see the technical manual on the RocketChip, so I can't really elaborate on how to program it. Suffice to say that the speed selection emulates the TransWarp protocols, which is one of the reasons why Applied Engineering had licensed technology from BPT (Bits and Pieces Technology Inc., who make the RocketChip) in it's new TransWarp II. Of course when BPT lost the court case with ZIP, Applied Engineering had to quickly re-design the TransWarp II into the TransWarp III, which brings us nicely to the TransWarp.

^The TransWarp^
The TransWarp is configured by using dip switches on the card. The only hardware register in memory, is at $C074, and is purely a speed register. This location can contain three different values, which you can read or write at any time;

        0 = Fastest speed (selected by switches on card, usually 3.6Mhz)
        1 = Normal 1Mhz
        3 = Disable the TransWarp completely until next cold-boot

Theoretically, by reading from $C074 and checking for a 0, 1 or 3, you could tell if a TransWarp is installed. The manual doesn't mention any way of recognising the card, so if any has a TransWarp that works (ie. mine doesn't work), please let us know. Sample code for testing for a TransWarp would then be something like this;

        LDA  $C074    ;Get speed register if possible
        CMP  #2       ;Two is invalid
        BEQ  NOTW
        AND  #3       ;Strip off all but bits 0 and 1
        CMP  $C074    ;Same as before?
        BNE  NOTW
YESTW   EQU  *

NOTW    EQU  *

This method ensures that the location is read at least twice to verify the TransWarp's existence. If a TransWarp is not installed, this location should not only be garbage, but different garbage on each read. The likelihood of getting a value below $40 is pretty remote, let alone getting the same value twice below 4.

So, to slow down the TransWarp;

        LDA  $C074    ;Get current speed
        STA  SAVESP   ;Save it
        LDA  #1
        STA  $C074    ;Slow down to 1Mhz

And to speed it up again;

        LDA  SAVESP   ;Retrieve the old speed
        STA  $C074    ;Set it again

At the time of writing this article, my attempts to get the documentation on the //c+ had failed. However, rumours of the //c+ containing either a ZIP chip or a ZIP chip hybrid abound. Considering this and Apple's association with Applied Engineering and the TransWarp GS, it would be safe to assume that the //c+ is controlled the same as either the ZIP chip, or the TransWarp (or RocketChip). Those of you requiring this information should contact Apple and see how you go. (Or if anyone has this information already, could you please let me know!)

^GS specific^
For the //GS, there are two methods to slow the machine down. Apart from running your code in bank $E0 or $E1, the Apple engineers have put in a CPU slowdown switch, and a
5.25" slow down sensor. The slowdown switch is at $E0/C036, and in a normal system, is thus shadowed into bank 0 at $00/C036. Bit7 is the select bit.

        LDA  $C036    ;Get current setting
        AND  #$7F     ;Make bit7 = 0 to slow down to 1Mhz
        STA  $C036    ;Stick it back in

Also, whenever the disk drive is activated, the machine is also slowed down. Although this isn't as simple as the above method, with a bit of fiddling you can arrive at the same solution. Considering the drives MUST run at 1Mhz, we can guarantee that activating the drive will slow down ALL CPU accelerators. Although they should all respect $E0/C036 as well, this method still works fine. How the ZIP GS handles slow downs I don't know, but this method simply must work with it!

        LDA  $C02D    ;Get the current slot settings
        PHA           ;Save them
        EOR  #$40     ;Switch to the alternate setting
                      ; ie. If currently internal Disk Port, select the card
                      ;     If currently card, select Disk Port
                      ; This will ensure that the drives do not get activated
        STA  $C02D    ;Set the new slot settings
        LDA  $C0EE    ;Set read mode just in case
        LDA  $C0E9    ;Drive on, //GS is now running at 1Mhz
        PLA           ;Restore original settings
        STA  $C02D    ;Set them back again

Of course to restore it again;

        LDA  $C02D    ;Get it
        PHA           ;Save it
        EOR  #$40     ;Flip it
        STA  $C02D    ;Set the new
        LDA  $C0E8    ;Turn drive off (fake drive that is)
        PLA
        STA  $C02D    ;Restore the setting

Of course with System 5 and it's 14 slot architecture, this means that you would either have to disable interrupts, or not use this technique under GS/OS. Either way, the $C036 is of course preferable. (And if Matt deatherage finds out, don't dare tell him that I suggested it!)

^TransWarp GS^
The TransWarp GS will respect the //GS speed register and slow down to 1Mhz whenever the system requires it, and of course speeding up again once the speed register is restored to high speed. This means that most programs can simply slow
down the //GS via $E0/C036, perform their time critical code, and then restore $E0/C036 again. TransWarp GS also has an IRQ slow down feature, which can come in handy. As an option, you can tell TransWarp GS to slow down to the current //GS speed whenever the interrupt bit in the status register is set. This effectively turns the TransWarp GS off, and enables the $E0/C036 switch to toggle between 2.8Mhz and 1Mhz. So all you need to do is this;

        SEI

Simple huh? However, considering it's only an option, you first have to check that it's activated. To do this, you either have to turn on AppleTalk/IRQ in the TransWarp GS desk accessory, or call the EnableIRQLogic routine in the TransWarp GS ROM;

        JSL  $BC/FF38 ;EnableIRQLogic call
        SEI           ;SEI now works fine.

But to speed it up again, you should restore the original setting of the IRQLogic. So, starting with the slow down routine;

GOSLOW  JSL  $BC/FF3C ;GetTWConfig - Returns current config in A
                      ; (Only four bits are actually used)
        STA  TWCONFIG ;Save the current config
        JSL  $BC/FF38 ;EnableIRQLogic
        SEI           ;Slow me down!

And the restore;

GOFAST  CLI           ;Speed me up!
        LDA  TWCONFIG ;Restore the original config settings
        JSL  $BC/FF40 ;SetTWConfig

^Code Timing^
So how does this all affect your time critical code?
For starters, actually slowing down the chip can be a real pain. Recognising all the different accelerators can be a long winded process. What we should be able to do instead, is read the speed of all the chips at once, and use that value to generate extra delays in the code. Unfortunately, this requires recognising each of the accelerators as well, so we may as well rule out the chip as part of a solution. Luckily enough, there are other parts of the hardware that generate a constant timer value across all of the machines, and a few extra ones in the //GS.
For the ][+ and //e we can use the vertical blanking period as a timer. In the //c, and a //e with a mouse card, we could use, once again, the vertical blanking period, or a vertical blanking interrupt. In the //GS, we can use any of the following; the 1 second interrupt, the .25 second
interrupt, the vertical blanking period, the heartbeat queue (driven from VBL interrupts), the tick counter (driven from the VBL interrupts), and scan-line interrupts.

^VBLs - Vertical Blanks^
The vertical blanking period is the time it takes the electron gun, inside the monitor, to get from the bottom right corner of the screen, to the top left corner of the screen. During this time, the screen display is not updated. This is perfect for re-drawing graphics in memory before the gun starts to re-trace the screen. Using this method, you have approximately 10819-10835 instruction cycles running at 1 Mhz on a 60 Hz screen, which occurs 60 times each second. As you can see, we now have a logical way of timing our code, independent of the actual speed of the CPU.
Bit 7 of location $C019 indicates whether the vertical blanking (VBL) period is currently active or not. This is where we start to have problems. On the pre-//GS machines, a one in bit 7 meant that a VBL was occurring, but on the //GS it's a zero. This means that if we require the current state of the electron gun, we have to code something like this;

        SEC
        JSR  $FE1F    ;Call standard //GS ID routine
        ROR           ;Drag the carry into the accumulator, bit 7
        EOR  $C019    ;Test bit 7 of the VBL indicator
        BPL  :VBLACT  ;Yep, VBL is occurring
        BMI  :NOVBL   ;Nup, must be re-tracing

However, if all we need is a timed delay for as close to one second as we can get, the following code will suffice.

        LDX  #60      ;60 VBLs for 60 Hz (plus or minus up to one re-trace)
VBLOOP  LDA  $C019    ;Wots the gun doing?
        BPL  *-3      ;Wait for it to go high
        LDA  $C019    ;Must be high now!
        BMI  *-3      ;Wait for it to go low again
        DEX           ;Done yet?
        BNE  VBLOOP
        LDA  $C019    ;Wait one more for non-alignment
        BPL  *-3

This routine will delay for one second, and considering that the position of the gun can't be read on pre-//GS machines, an extra poll is performed in case we started just before it went high (and wasted the first BPL). This of course is optional, and shouldn't be required unless you need AT LEAST one second of delay.

^Waiting around at $FCA8^
The WAIT routine in the //GS at $FF/FCA8 (shadowed into $00/FCA8), automatically slows the machine down to 1Mhz by
changing the speed register at $C036. Once the delay period has ended, it restores the original value. Of course the TransWarp GS will respect this and slow down as well. However, more elaborate routines will require more control of timing than the overheads and limitations of the WAIT routine.
To use WAIT under GS/OS, you would either have to use a _FWEntry call, which incurs the overhead of the Toolbox, or setup a WAIT call routine in bank 0 to set up the appropriate environment before and after the call. For games which use the joystick, it is quite easy to set up and call a routine in bank 0 which calls the monitor paddle routines directly, so why not a WAIT routine as well?

^Interrupts^
Interrupts can be very handy. Here is a simple 1 second interrupt handler under P8. You can type it in at $300 if you wish;

START   CLC           ;Switch to native 65c816
        XCE
        REP  $20      ;Use long m
        LDA  #TIC
        STAL $E10055  ;Point the 1 second IRQ vector
        SEP  $20      ; to point to me
        LDA  #^TIC
        STAL $E10057
        LDA  #4       ;Turn on 1 second interrupts
        TSB  $C023
        SEC           ;Switch back and exit
        XCE
        RTS
TIC     PHB           ;Tick entry. Save data bank
        PHK           ;Get my bank
        PLB
        LDA  $C030    ;Make a little tick noise
        LDX  #10
LOOP    PHA
        PLA
        DEX
        BPL  LOOP
        LDA  $C030
        LDA  #$40     ;Reset 1 second interrupt
        TSB  $C032
        PLB           ;Restore the data bank
        CLC           ;Tell ROM that I handled it
        RTL           ;Exit

The ROM (and Toolbox) interrupt hooks all roughly work the same way as the above routine. It is usually up to the caller to clear the interrupt, and signal to the ROM if it was handled correctly. The Toolbox routines _IntSource and _SetVector should really be used for interrupt handling.
However, if you need speed, you'll have to access the switches directly, as I did in the above example.

^Apple Offerings^
The latest in speed from Apple, is the new High Speed SCSI card. And because it was on the Apple // first, it's safe to say that all new Macs now use the Apple // SCSI technology. Whilst not a CPU accelerator card, it does accelerate access to SCSI devices. Whilst the old SCSI card was able to use pseudo-DMA, which still uses the CPU for the final transfer of the data, the new card has real DMA. The following figures are for loading 102 x 32k super hires pictures with the FAST! option of my Slide Master program (as demonstrated at the May Apple // meeting). All tests were run with GS/OS RAM cache turned off:

From /RAM5 took 43 seconds
From the new SCSI card took 22 seconds (who needs a TransWarp GS?)
From the new SCSI card with 7Mhz TransWarp GS took 16 seconds

^What the future holds^
As of finishing this article (31st May 1990), there were various rumours floating around about even faster CPUs. ZIP technology and a rating of 20Mhz has been mentioned, as has a 20Mhz TransWarp III. Also, the 21Mhz reverse engineered 65C816 should be released some time in the next couple of months, however a card to handle it may be a bit longer.
If you're interested in speeding up your Apple //, then there are quite a few choices already. With the seemingly impossible 20Mhz barrier apparently about to be broken on the //GS and //e, we can look forward to some pretty powerful machines. For the latest on speed, try coming along to the //GS SIG, or logging on to AUGABBS. We usually mention, amongst many other things, the latest developments in accelerator technology.

THIS CONTENT COPYRIGHT © 2007, APPLE MACINTOSH USERS' GROUP, SYDNEY
Permission has been obtained to make this material available on the Internet.

Permission is hereby granted for non-profit user groups to republish this content.
PLEASE CREDIT THE AUTHOR AND THE SOURCE: Applecations, publication of the Apple Users' Group, Sydney, Australia

THIS PAGE COPYRIGHT © 2007, ANDREW ROUGHAN